-
Notifications
You must be signed in to change notification settings - Fork 1.9k
C++: Generate IR for assertions in release builds #21142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedAssertion.qll
Fixed
Show fixed
Hide fixed
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedStmt.qll
Fixed
Show fixed
Hide fixed
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedStmt.qll
Fixed
Show fixed
Hide fixed
0cf33ff to
20c0239
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for generating IR for assertions in C++ release builds where they would normally be compiled out. The implementation parses macro arguments from assertion macros (like assert and __analysis_assume) to synthesize the conditional checks that would have been present in debug builds, helping CodeQL remove false positives by understanding assertion constraints.
Changes:
- Adds new
TranslatedAssertion.qllmodule that parses assertion macro arguments and generates synthetic IR for simple comparisons - Updates
TranslatedStmt.qllandTranslatedElement.qllto exclude assertion statements from normal translation - Adds new instruction tags and test coverage for assertion IR generation
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedAssertion.qll | New module implementing assertion parsing and IR generation for simple comparison assertions |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedStmt.qll | Excludes assertion statements from normal translation to avoid duplication |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedElement.qll | Integrates assertion translation into the element translation framework |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/InstructionTag.qll | Adds four new instruction tags for assertion IR generation |
| cpp/ql/lib/semmle/code/cpp/stmts/Stmt.qll | Adds override for getEnclosingFunction() in StmtParent base class |
| cpp/ql/lib/semmle/code/cpp/exprs/Expr.qll | Adds override keyword to getEnclosingFunction() |
| cpp/ql/lib/semmle/code/cpp/Element.qll | Adds new isAffectedByMacro(MacroInvocation) predicate |
| cpp/ql/test/library-tests/ir/ir/ir.cpp | Adds test cases for assertion IR generation including templates and shadowed variables |
| cpp/ql/test/library-tests/ir/ir/raw_ir.expected | Expected IR output for assertion tests |
| cpp/ql/test/library-tests/ir/ir/aliased_ir.expected | Expected aliased IR output for assertion tests |
| cpp/ql/test/library-tests/ir/ir/PrintAST.expected | Expected AST output for assertion tests |
Comments suppressed due to low confidence (1)
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedStmt.qll:330
- The comment has a typo: "synthedsize" should be "synthesize".
// we synthedsize the check that would have occurred.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedStmt.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedStmt.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedAssertion.qll
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedAssertion.qll
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedAssertion.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedAssertion.qll
Show resolved
Hide resolved
4370bc6 to
4f4baee
Compare
| int shadowed = x; | ||
| assert(shadowed > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the idea behind this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that I don't to implement local name resolution for C/C++ in QL 😅 So I simplified the necessary name resolution by requiring that there's only 1 local variable with the given name in the enclosing function (see the unique here which achieves this).
| /** Holds if this element is affected by the expansion of `mi`. */ | ||
| predicate isAffectedByMacro(MacroInvocation mi) { | ||
| affectedbymacroexpansion(underlyingElement(this), unresolveElement(mi)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this might be potentially confusing, because isAffectedByMacro/0 is not implemented as affectedbymacroexpansion(underlyingElement(this), _).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Do you have any suggestion for a better QLDoc?
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedStmt.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedStmt.qll
Outdated
Show resolved
Hide resolved
| private predicate macroInvocationLocation(int startline, Function f, MacroInvocation mi) { | ||
| mi.getMacroName() = ["assert", "__analysis_assume"] and | ||
| mi.getLocation().hasLocationInfo(_, startline, _, _, _) and | ||
| f.getEntryPoint().isAffectedByMacro(mi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this line achieve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relates the function f to the macro invocation mi. Otherwise, we don't know which function mi occurs in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so why does this look at - what is effectively - the whole body of the function, and not just at the statement we're interested in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that as well. I guess we could do something like:
exists(Stmt stmt |
stmt.isAffectedByMacro(mi) and
f = stmt.getEnclosingFunction()
)I only did the above since that didn't need me to existentially quantify a Stmt. But if you prefer the other thing I guess that's just as fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that really resolves the thing I have trouble wrapping my head around: Why do we need to refer to the function at all here? Why isn't it sufficient to look at the statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statement isn't enough because of templates. Consider this scenario (let's ignore the uninstantiated function template for the discussion since we don't generate IR for that anyway):
1 | #define assert(x) ((void)0)
2 |
3 | template<typename T>
4 | void f(int x) {
5 | assert(x > 10);
6 | }
7 |
8 | // two arbitrary instantiations
9 | template void f<short>();
10| template void f<long>();Now there are two (instantiated) function templates: f<short> and f<long> and they each have an ExprStmt for ((void)0) that we want to generate an assertion in the IR for.
To make it easy to figure out where we should place the assertion in the IR we require that the only thing on the line is the assertion.
However, the two ExprStmts have the same location. So the unique aggregate here require that there's a unique statement in the enclosing function where the macro is expanded.
So even though there are two ExprStmts on line 5 the condition "there is only 1 statement on the line" is still satisfied because we require it per-function. We could remove the function stuff, but then we wouldn't generate IR for the assertion since there would be two statements on the line (corresponding to the two ExprStmts from the two instantiations).
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense.
…slatedStmt.qll Co-authored-by: Jeroen Ketema <[email protected]>
…slatedStmt.qll Co-authored-by: Jeroen Ketema <[email protected]>
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedAssertion.qll
Outdated
Show resolved
Hide resolved
…slatedAssertion.qll Co-authored-by: Jeroen Ketema <[email protected]>
| unique(StmtParent p, int startline, Function f | | ||
| macroInvocationLocation(startline, f, mi) and | ||
| stmtParentLocation(startline, f, p) and | ||
| // Also do not count the elements from the expanded macro. i.e., when checking | ||
| // if `assert(x)` is the only thing on the line we do not count the | ||
| // generated `((void)0)` expression. | ||
| not p = mi.getAnExpandedElement() | ||
| | | ||
| p | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for using StmtParent here? what's the relevance of StmtExprs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to capture the notion "this is the only entity we'll be generating IR for on this line (in this function)" to have a meaningful place to insert the assertion, and StmtParent (i.e., the union of Expr and Stmt) is the most general class which has an enclosing function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would change if we just check that there's just a single macro that affects the statement we're interested in? Instead of doing all this line business?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single macro = single macro of the kind we're interested in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how you would do that without doing all the line business. If you have:
#define assert(x)
void test(int x) {
assert(x < 10);
}I'm not sure how you relate the EmptyStmt to the MacroInvocation for assert(x < 10) without resorting to line matching. The affectedbymacroexpansion extensional on this DB only relates the macro invocation to the enclosing block statement:
Maybe that's a bug? If it related it to the macro invocation to the EmptyStmt (which is the statement we use as the TranslatedElement) then we could probably avoid line matching.
In C/C++, assertions are often done via a macro defined like:
where
/* implementation defined */represents the actual operation that implements the assertion in a debug build.However, in a release build (i.e., when
NDEBUGis defined) then no check is performed. This is great for performance, but it means the CodeQL database has no way of observing these conditions. And these conditions often help us remove FPs (i.e., a null check or an index validation prior to a dereference).This PR adds support for identifying (a small subset of) assertions by generating IR corresponding to the check which would have been performed had assertions been enabled (the rationale being basically the same as what Schack wrote for Java here).
This PR only covers a small subset of assertions since we only have the assertion as text since this is a macro argument. So we have to parse that macro argument in QL 😭. Because of this, I've limited this PR to only genearte IR for an assertion of the form
E op EwhereEis an integer constant, or a local variable, andopis=,!=,<,>,<=, or>=. (Locally, I have a follow-up PR to add support for negations, disjunctions, and conjunctions.)As I didn't feel like implementing all of C++'s conversion rules the generated IR will also not be totally conversion-correct. For example, in an expression like
x < ywherexisintandyisunsigned intthere would normally be a signed-to-unsigned conversion onxbut currently we simply generate a comparison between types of different types. I don't imagine this will be a problem in practice, though.Commit-by-commit review recommended.
The three new alerts look genuine. They arise because we now realize that there's a suspicious looking assertion here which ought to have been
fhSize < dstCapacity.